Weighted Co-Occurrence Bio-Term Graph for Unsupervised Word Sense Disambiguation in the Biomedical Domain
نویسندگان
چکیده
Word Sense Disambiguation (WSD) is a significant and challenging task for text understanding processing. This paper presents an unsupervised approach based on Weighted Co-occurrence bio-Term Graph (WCOTG) performing WSD in the biomedical domain. The graph automatically created from terms that are extracted corpus of downloaded scientific abstracts. Two kinds weights introduced links built bio-term taken as important factors process disambiguation. modified Personalised PageRank (PPR) algorithm used WSD. When evaluated NLM-WSD MSH-WSD test datasets, acronym set, method outperforms widely ones addressing same problem, average result almost equal to BlueBERT_LE-based method. In contrast, our has no additional enhancement or training BERT-based models. Comparative experiments validate positive effect links’ weight disambiguation efficiency. Last, statistical relation among system accuracy, numbers medical abstracts corpus, corresponding suggest excellent minimum scale, when resources limited.
منابع مشابه
Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering
With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as 'development' can refer to developmental biology or to ...
متن کاملGraph Connectivity Measures for Unsupervised Word Sense Disambiguation
Word sense disambiguation (WSD) has been a long-standing research objective for natural language processing. In this paper we are concerned with developing graph-based unsupervised algorithms for alleviating the data requirements for large scale WSD. Under this framework, finding the right sense for a given word amounts to identifying the most “important” node among the set of graph nodes repre...
متن کاملGraph-based Centrality Algorithms for Unsupervised Word Sense Disambiguation
This thesis introduces an innovative methodology of combining some traditional dictionary based approaches to word sense disambiguation (semantic similarity measures and overlap of word glosses, both based on WordNet) with some graph-based centrality methods, namely the degree of the vertices, Pagerank, closeness, and betweenness. The approach is completely unsupervised, and is based on creatin...
متن کاملUnsupervised Domain Relevance Estimation for Word Sense Disambiguation
This paper presents Domain Relevance Estimation (DRE), a fully unsupervised text categorization technique based on the statistical estimation of the relevance of a text with respect to a certain category. We use a pre-defined set of categories (we call them domains) which have been previously associated to WORDNET word senses. Given a certain domain, DRE distinguishes between relevant and non-r...
متن کاملSelf-training and co-training in biomedical word sense disambiguation
Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, attempting to select the proper sense of ambiguous words. Due to the scarcity of training data, semi-supervised learning, which profits from seed annotated examples and a large set of unlabeled data, are worth researching. We present preliminary results of two semi-supervised learnin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3272056